Existing approaches for vision-and-language navigation (VLN) are mainly based on cross-modal reasoning over discrete views. However, this scheme may hamper an agent's spatial and numerical reasoning because of incomplete objects within a single view and duplicate observations across views. A potential solution is mapping discrete views into a unified birds's-eye view, which can aggregate partial and duplicate observations. Existing metric maps could achieve this goal, but they suffer from less expressive semantics (e.g. usually predefined labels) and limited map size, which weakens an agent's language grounding and long-term planning ability. Inspired by the robotics community, we introduce hybrid topo-metric maps into VLN, where a topological map is used for long-term planning and a metric map for short-term reasoning. Beyond mapping with more expressive deep features, we further design a pre-training framework via the hybrid map to learn language-informed map representations, which enhances cross-modal grounding and facilitates the final language-guided navigation goal. Extensive experiments demonstrate the effectiveness of the map-based route for VLN, and the proposed method sets the new state-of-the-art on three VLN benchmarks.
translated by 谷歌翻译
Accurate localization ability is fundamental in autonomous driving. Traditional visual localization frameworks approach the semantic map-matching problem with geometric models, which rely on complex parameter tuning and thus hinder large-scale deployment. In this paper, we propose BEV-Locator: an end-to-end visual semantic localization neural network using multi-view camera images. Specifically, a visual BEV (Birds-Eye-View) encoder extracts and flattens the multi-view images into BEV space. While the semantic map features are structurally embedded as map queries sequence. Then a cross-model transformer associates the BEV features and semantic map queries. The localization information of ego-car is recursively queried out by cross-attention modules. Finally, the ego pose can be inferred by decoding the transformer outputs. We evaluate the proposed method in large-scale nuScenes and Qcraft datasets. The experimental results show that the BEV-locator is capable to estimate the vehicle poses under versatile scenarios, which effectively associates the cross-model information from multi-view images and global semantic maps. The experiments report satisfactory accuracy with mean absolute errors of 0.052m, 0.135m and 0.251$^\circ$ in lateral, longitudinal translation and heading angle degree.
translated by 谷歌翻译
Adapter Tuning, which freezes the pretrained language models (PLMs) and only fine-tunes a few extra modules, becomes an appealing efficient alternative to the full model fine-tuning. Although computationally efficient, the recent Adapters often increase parameters (e.g. bottleneck dimension) for matching the performance of full model fine-tuning, which we argue goes against their original intention. In this work, we re-examine the parameter-efficiency of Adapters through the lens of network pruning (we name such plug-in concept as \texttt{SparseAdapter}) and find that SparseAdapter can achieve comparable or better performance than standard Adapters when the sparse ratio reaches up to 80\%. Based on our findings, we introduce an easy but effective setting ``\textit{Large-Sparse}'' to improve the model capacity of Adapters under the same parameter budget. Experiments on five competitive Adapters upon three advanced PLMs show that with proper sparse method (e.g. SNIP) and ratio (e.g. 40\%) SparseAdapter can consistently outperform their corresponding counterpart. Encouragingly, with the \textit{Large-Sparse} setting, we can obtain further appealing gains, even outperforming the full fine-tuning by a large margin. Our code will be released at: https://github.com/Shwai-He/SparseAdapter.
translated by 谷歌翻译
众所周知,很难拥有一个可靠且强大的框架来将多代理深入强化学习算法与实用的多机器人应用联系起来。为了填补这一空白,我们为称为MultiroBolearn1的多机器人系统提出并构建了一个开源框架。该框架构建了统一的模拟和现实应用程序设置。它旨在提供标准的,易于使用的模拟方案,也可以轻松地将其部署到现实世界中的多机器人环境中。此外,该框架为研究人员提供了一个基准系统,以比较不同的强化学习算法的性能。我们使用不同类型的多代理深钢筋学习算法在离散和连续的动作空间中使用不同类型的多代理深钢筋学习算法来证明框架的通用性,可扩展性和能力。
translated by 谷歌翻译
无监督的句子嵌入学习最近由对比度学习方法(例如SIMCSE)主导,该方法保持积极对相似,并将负面对拆开。对比操作旨在通过在积极实例之间最大化相互信息来保持尽可能多的信息,从而导致句子嵌入中的冗余信息。为了解决这个问题,我们提出了一个基于信息最小化的对比度学习(Informin-CL)模型,以保留有用的信息并通过最大化相互信息并最大程度地减少无监督句子表示学习的正面实例之间的信息熵,从而丢弃冗余信息。具体而言,我们发现信息最小化可以通过简单的对比度和重建目标来实现。重建操作通过另一个正实例重构积极实例,以最大程度地减少正实例之间的信息熵。我们在下游任务中评估了我们的模型,包括受监督和无监督的(语义文本相似性)任务。广泛的实验结果表明,我们的Informin-CL获得了最先进的性能。
translated by 谷歌翻译
本文介绍了Kings Arena的荣誉,Kings Arena是基于国王荣誉的强化学习(RL)环境,这是世界上最受欢迎的游戏之一。与以前大多数工作中研究的其他环境相比,我们的人对竞争性强化学习提出了新的概括挑战。与对手竞争的一个代理商是一个多代理的问题;它需要概括能力,因为它具有控制和不同的对手竞争的不同目标。我们描述了国王域名荣誉的观察,动作和奖励规范,并提供了一个基于python的开源界面,以与游戏引擎进行通信。我们为纪念国王竞技场的二十个目标英雄提供了各种任务,并为具有可行的计算资源的基于RL的方法提供了初始基线结果。最后,我们展示了国王竞技场的荣誉和对挑战的可能补救措施所面临的概括挑战。所有软件(包括环境级)均可在https://github.com/tencent-ailab/hok_env上公开获得。该文档可在https://aiarena.tencent.com/hok/doc/上获得。
translated by 谷歌翻译
准确的交通状况预测为车辆环境协调和交通管制任务提供了坚实的基础。由于道路网络数据在空间分布中的复杂性以及深度学习方法的多样性,有效定义流量数据并充分捕获数据中复杂的空间非线性特征变得具有挑战性。本文将两种分层图池方法应用于流量预测任务,以减少图形信息冗余。首先,本文验证了流量预测任务中层次图池方法的有效性。分层图合并方法与其他基线在预测性能上形成鲜明对比。其次,应用了两种主流分层图池方法,节点群集池和节点下降池,用于分析流量预测中的优势和弱点。最后,对于上述图神经网络,本文比较了不同图网络输入对流量预测准确性的预测效应。分析和汇总定义图网络的有效方法。
translated by 谷歌翻译
我们引入了基于高斯工艺回归和边缘化图内核(GPR-MGK)的探索性主动学习(AL)算法,以最低成本探索化学空间。使用高通量分子动力学模拟生成数据和图神经网络(GNN)以预测,我们为热力学性质预测构建了一个主动学习分子模拟框架。在特定的靶向251,728个烷烃分子中,由4至19个碳原子及其液体物理特性组成:密度,热能和汽化焓,我们使用AL算法选择最有用的分子来代表化学空间。计算和实验测试集的验证表明,只有313个(占总数的0.124 \%)分子足以训练用于计算测试集的$ \ rm r^2> 0.99 $的精确GNN模型和$ \ rm rm r^2>>实验测试集0.94 $。我们重点介绍了提出的AL算法的两个优点:与高通量数据生成和可靠的不确定性量化的兼容性。
translated by 谷歌翻译
故事结束一代旨在为给定的故事背景产生合理的结局。该领域的大多数现有研究都集中在产生连贯或多元化的故事结尾,而他们忽略了不同的角色可能会导致给定故事的不同结局。在本文中,我们提出了一个面向角色的故事结束生成器(Coseg),以自定义故事中每个角色的结局。具体来说,我们首先提出一个角色建模模块,以从故事背景中提取的描述性经历中学习角色的个性。然后,受到化学反应中离子交换机制的启发,我们设计了一个新颖的矢量断裂/形成模块,以通过类似信息交换程序来学习每个字符和相应上下文之间的固有相互作用。最后,我们利用注意力机制学习有效的特定角色相互作用,并将每种相互作用馈送到解码器中,以生成角色 - 与角色的结尾。广泛的实验结果和案例研究表明,与最先进的方法相比,Coseg在生成的结局质量方面取得了重大改善,并且有效地自定义了不同字符的结局。
translated by 谷歌翻译
颜色和结构是结合形象的两个支柱。对神经网络识别的关键结构感兴趣,我们通过将颜色空间限制为几个位来隔离颜色的影响,并找到能够在此类约束下实现网络识别的结构。为此,我们提出了一个颜色量化网络Colorcnn,该网络通过最大程度地减少分类损失来学习在有限的颜色空间中构建图像。在Colorcnn的体系结构和见解的基础上,我们介绍了ColorCnn+,该+支持多种颜色空间大小的配置,并解决了以前的识别精度差的不良问题和在大型颜色空间下的不良视觉保真度。通过一种新颖的模仿学习方法,Colorcnn+学会了群集颜色,例如传统的颜色量化方法。这减少了过度拟合,并有助于在大颜色空间下的视觉保真度和识别精度。实验验证ColorCNN+在大多数情况下取得了非常有竞争力的结果,可以保留具有准确颜色的网络识别和视觉保真度的关键结构。我们进一步讨论关键结构和准确颜色之间的差异及其对网络识别的具体贡献。对于潜在应用,我们表明ColorCNN可以用作网络识别的图像压缩方法。
translated by 谷歌翻译